An effect size for comparing the strength of morphological integration across studies

Abstract Understanding how and why phenotypic traits covary is a major interest in evolutionary biology. Biologists have long sought to characterize the extent of morphological integration in organisms, but comparing levels of integration for a set of traits across taxa has been hampered by the lack of a reliable summary measure and testing procedure. Here, we propose a standardized effect size for this purpose, calculated from the relative eigenvalue variance, Vrel. First, we evaluate several eigenvalue dispersion indices under various conditions, and show that only Vrel remains stable across samples size and the number of variables. We then demonstrate that Vrel accurately characterizes input patterns of covariation, so long as redundant dimensions are excluded from the calculations. However, we also show that the variance of the sampling distribution of Vrel depends on input levels of trait covariation, making Vrel unsuitable for direct comparisons. As a solution, we propose transforming Vrel to a standardized effect size (Z‐score) for representing the magnitude of integration for a set of traits. We also propose a two‐sample test for comparing the strength of integration between taxa, and show that this test displays appropriate statistical properties. We provide software for implementing the procedure, and an empirical example illustrates its use.


Simulation Protocol
To empirically estimate the variance of Z V rel , we first generated a p × p covariance matrix Σ, whose elements contained input values that would result in a V rel ≈ 0.6. The source code WatanabeCov.r (provided as supporting information) contains functions for obtaining this covariance matrix, following Watanabe (2022). Next we simulated multivariate datasets by drawing independent observations from a normal distribution of N (0, Σ). With this procedure we simulated 1000 multivariate datasets at each of 20 different levels of sample size : N = (10, 35, 60, . . . , 500). For each dataset, we then calculated the associated V rel , found from the set of non-trivial eigenvalues (i.e., λ > 0.0) obtained from the trait covariance matrix. Following the main article, each V rel was then linearly-transformed to the scale (−1.0 → 1.0) as: V * rel = (2 * V rel ) − 1. Next, the transformed values were converted to effect sizes using Fisher's Z-transformation: (Z V rel = 1 2 ln 1+V * rel 1−V * rel ), and the variance across the 1000 effect sizes at each level of N was obtained. These values were then compared to values using the approximation: σ 2 Z = 1/ (N − 3). Finally, the entire simulation was repeated at each of 25 levels of variable number: p = (10, 20, 30, . . . , 250), to determine whether the number of variables affected the expected variation. The variance of Fisher's Z for the bivariate correlation coefficient (ρ) was also estimated using an analogous procedure, to form a basis of comparison for using σ 2 Z = 1/(N − 3) for Z V rel .

Simulation Results
As expected, the sampling variation of Z V rel decreased with increasing N (Fig. 1A). Patterns were largely consistent across values of p, implying that the sampling variation of Z V rel was relatively insensitive to variable number. σ 2 Z = 1/(N − 3) was found to be a reasonable approximation to the empirical variance at large N . It was noted that at small sample sizes, σ 2 Z = 1/(N − 3) overestimated the true variance slightly (Fig. 1A). This was in contrast to what is found with Fisher's Z for the correlation coefficient, where σ 2 Z = 1/(N − 3) matched empirical estimates. Note, however, that for Z V rel , a slight overestimate of its variance will result in statistical tests that are more conservative, as σ 2 Z is found in the denominator of both one-sample and two-sample tests (see main article).
Overall, these results revealed that σ 2 Z = 1/(N − 3) was a reasonable first approximation to the sampling variation of Z V rel across both N and p.  (N ). The different lines (black) correspond to different numbers of variables (p). The expected variance is shown in red. (B) Emprical variance in Z ρ as estimated across simulated datasets for differing levels of sample size (N ). The expected variance is shown in red.

2: Effect of Redundant Dimensions (Rank-Deficiency) on V rel
As described in the main article, an important consideration when estimating the relative eigenvalue variance, V rel , is how to treat those dimensions with zero variation (i.e., for covariance matrices, dimensions whose λ = 0.0). Recent work (O'Keefe et al. 2022) suggests that rank-deficient data dimensions should be excluded from the calculation of eigenvalue dispersion indices when estimating the magnitude of integration. Empirically, there are three primary reasons why redundancies exist in phenotypic datasets; resulting in covariance matrices that are rank-deficient: • When p > N • When one variable is a perfect linear combination of others (e.g., A + B = C) • When data-wide standardizations are performed, resulting in linear dependencies (e.g., landmark configurations are translated, rotated, and scaled) The consequence of any of these is that one or more data dimensions are rendered singular, and thus one or more eigenvalues of the covariance matrix will be equal to zero. With respect to summarizing integration, the question then is whether these trivial dimensions should be included.
We contend that for estimates of integration, calculations of eigenvalue dispersion indices such as V rel should be based on only the non-trivial dimensions of the dataset. In this section, we provide a simple empirical demonstration that complements the suggestion of O'Keefe et al (2022), and shows that when redundant dimensions are retained, the resulting estimates of V rel from covariance matrices are positively biased.

Simulation Protocol
To demonstrate the effect of redundant dimensions on estimates of V rel , we simulated data under three different scenarios. The first contained many more observations than variables (N >> p), with N = 500 and p = 30. The second contained many more variables than observations (p >> N ), with N = 30 and p = 500. The third contained an equal number of observations and variables (N = p) at small sample size, with N = 8 and p = 8. For each scenario we generated a p × p covariance matrix Σ, whose elements contained input values that would result in a V rel ≈ 0.6. The source code WatanabeCov.r (provided as supporting information) contains functions for obtaining this covariance matrix, following Watanabe (2022). Next we simulated 1000 datasets for each scenario by drawing independent observations from a normal distribution of N (0, Σ), based on N and p as specified above. These represented the 'base' datasets for each scenario.
Using the base datasets, a series of rank-deficient datasets were then constructed, by adding one or more dimensions of trivial information to each matrix. This was accomplished by appending additional columns of 0 to each dataset (p 0 = 4, 100, 400). We then obtained the eigenvalues for the covariance matrices for all datasets, calculated its associated V rel , and obtained mean values for each scenario and level of redundancy.

Simulation Results
As seen in Figure 2, when redundant dimensions were included in the calculation of V rel based on covariance matrices, estimates of V rel were always higher than the original value (when using the correlation matrix, V rel decreased). Further, as the number of redundant dimensions increased, so too did the bias in V rel . This pattern was most acute in datasets of small sample size (Fig. 2C).
Additionally, when the correlation matrix was used instead of the covariance matrix, the pattern was reversed.
Here including more and more rank-deficient dimensions to the dataset resulted in a decrease in estimates of V rel (Fig. 3). Here the pattern was more pronounced, and V rel dropped precipitously and approached zero as the number of redundant dimensions increased. This pattern would incorrectly imply that there was no covariation between traits, when in fact it was due to having many redundant dimensions in the dataset.
Overall, this illustration demonstrates that retaining redundant dimensions in the dataset results in a bias in estimates of V rel , and thus in the degree of integration among traits. Further, the direction of this bias depends upon whether the trait covariance matrix or correlation matrix is used. Either way, this demonstration provides empirical justification that redundant dimensions with zero variation must be excluded from calculations of eigenvalue dispersion indices, if those estimates are to be an accurate reflection of the actual patterns in the data (see also O'Keefe et al. 2022).

3: Simulation Scripts for Main Article
In this section, we provide all simulation scripts used to generate the primary results in the main article.

3.3: Statistical Properties of Z V rel
In the main article, we evaluated the statistical performance of two-sample tests based on Z V rel . The following r-scripts were used to generate Figure 5, which contained those results.